Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect options from baseline #124

Merged
merged 9 commits into from
Feb 6, 2019

Conversation

killuazhu
Copy link
Contributor

Fully implements the proposal in #121 (comment). More than half of the changes are new test cases.

Some logic can be implemented in different places, I chose the current way based on my understanding. Let me know if you have any suggetsions or comments.

FYI @KevinHock @domanchi
CC @jribm

Copy link
Contributor

@domanchi domanchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great test cases! Fix and ship!

if args.import_filename:
plugins = initialize.merge_plugin_from_baseline(
_get_plugin_from_baseline(args.import_filename), args,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this block inside _perform_scan, right after we get the old_baseline.

e.g.

def _perform_scan(args, plugins):
    old_baseline = _get_existing_baseline(args.import_filename)
    if old_baseline:
        plugins = initialize.merge_plugin_from_baseline(...)

This way, we don't need to incur two file reads. Also, I'm not entirely sure that reading from stdin would work twice.

) == 0

print("Used:", file_writer.call_args[1]['data']['plugins_used'])
print("Wrote:", plugins_wrote)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove these print statements, before merging.

@@ -44,17 +44,53 @@ def test_file_with_secrets(self, mock_log):
def test_file_no_secrets(self):
assert_commit_succeeds('test_data/files/file_with_no_secrets.py')

def test_baseline(self):
@pytest.mark.parametrize(
' has_result, use_private_key_scan,, hook_command, commit_result',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra comma? maybe s/commit_result/commit_succeeds? (also prefixed space, but less fussy)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, extra comma. The rename sounds good. I will update in a new commit.

},
],
),
( # ignore overwriten option from CLI when not using --use-all-plugins
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. If you explicitly override it by providing the CLI argument, even without --use-all-plugins, wouldn't you expect CLI to take precedence over a configured baseline?

Example use case: user wants to see if a decrease in sensitivity results in a large drop in noise.

$ detect-secrets scan --update .secrets.baseline --base64-limit 5
$ git diff .secrets.baseline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I debated about that, it might have some edge cases. For example, if the baseline does not have the base64 scan, do we ignore the limit? or add the base64 scan in? So I chose to let the user on purposely use the plugin, then allow adjusting. How would you suggest to adjust?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good point you brought up.

What about displaying a warning when they don't have the plugin installed, then ignoring it? Seems to accomplish several things:

  1. Allows CLI arguments to configure limits, when they are relevant.
  2. Ignores CLI arguments when not relevant.
  3. Informs the user that arguments passed are redundant, yet not hard blocking due to poor invocation of command.
$ detect-secrets scan --update .secrets.baseline --base64-limit 5
WARN: --base64-limit specified, but Base64HighEntropyString not configured! Ignoring...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's updated to support these cases now. New test cases added.

Copy link
Collaborator

@KevinHock KevinHock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great so far

return plugins


def _merge_plugin_from_baseline(baseline_plugins, args):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename to _trim_disabled_plugins_from_baseline, it would make the make the comments in merge_plugin_from_baseline less needed.



def _merge_plugin_from_baseline(baseline_plugins, args):
merged_plugins_dict = {vars(plugin)['name']: plugin for plugin in baseline_plugins}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great use of vars 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Credit to @meneal

@killuazhu
Copy link
Contributor Author

@domanchi ready for another round.

Copy link
Contributor

@domanchi domanchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Thanks for writing that hairy logic of plugin prioritization.


# input param priority > baseline
input_plugins_dict = dict(args.plugins)
for plugin_name, plugin_params in list(input_plugins_dict.items()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need to turn input_plugins_dict.items() into a list. It gives us an iterator, so we get some performance boosts by avoiding turning it into a list, before iterating through it.

Same goes for other uses of this behavior.

if args.disabled_plugins:
for plugin_name in args.disabled_plugins:
if plugin_name in plugins_dict:
plugins_dict.pop(plugin_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plugins_dict = {
    plugin_name: plugin_params
    for plugin_name, plugin_params in baseline_plugins_dict.items()
    if plugin_name not in args.disabled_plugins
}

log.warning(
'%s specified, but %s not configured! Ignoring...'
% ("".join(["--", param_name.replace("_", "-")]), plugin_name),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this DRYer, seeing that we use it below as well? One idea might be to implement an iterator, or just encapsulate it in a function. For example:

def get_prioritized_parameters(plugins, is_default_plugins_map, prefer_default=True):
    """
    :type is_default_plugins_map: dict(str => bool)
    :param is_default_plugins_map: mapping of parameter name to whether its value is derived
        from a default value.

    :param prefer_default: if True, will yield if plugin parameters are from default values.
        Otherwise, will yield if plugin parameters are *not* from default values.
    """
    ...
    yield plugin_name, param_name, param_value

Then, you can do:

plugins_dict = dict(args.plugins)
for plugin_name, param_name, param_value in get_prioritized_parameters(
    input_plugins_dict,
    args.param_from_default,
    prefer_default=False,
):
    try:
        plugins_dict[plugin_name][param_name] = param_value
    except KeyError:
        ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been updated now. Although having more lines, but I do think the logic is a little bit more clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clearer logic >> more lines, especially when it comes to long term maintenance 😸

@@ -309,6 +309,7 @@ def consolidate_args(args):

active_plugins = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put a comment here that explains the rationale for why we're returning these new values.

I understand the purpose of doing it, and that there's really no better way to incorporate the baseline values in it. It's just unfortunate that this hairy logic to perform prioritization of plugins needs to live in two places (unlike this doc string suggests).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disabled_plugins is mainly used to filter off some disabled plugins when reading the plugin list from baseline. Relates to the comment below, it's possible to generate the disabled_plugins list on-demand by calculating the difference between all plugins and active plugins. I can add a helper function to the usage.py.

I was using disabled_plugins earlier to avoid re-iterating on the all plugins to get such list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Makes sense.

Thanks for adding the helper function anyway!

@@ -309,6 +309,7 @@ def consolidate_args(args):

active_plugins = {}
disabled_plugins = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think about it, what is our purpose of returning a set of disabled_plugins? Since args.plugins is a collection of all active plugins (combining default values and specifically set values), wouldn't any plugin that isn't in this dictionary be disabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above.

@@ -309,6 +309,7 @@ def consolidate_args(args):

active_plugins = {}
disabled_plugins = {}
param_from_default = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe is_default_parameter, so it's clearer that it contains boolean values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. I would make a suggestion to rename it to is_using_default_value

@killuazhu
Copy link
Contributor Author

Thanks for writing that hairy logic of plugin prioritization.

It definitely makes the job more challenge when I have to convert the BasePlugin tuple into a dictionary, then compares and sets values. I noticed the BasePlugin class has made __dict__ field itself a property instead of the exact properties such as name, base64_limit and etc. This prevents the {set|has|get}attr from reading and writing values. May I ask what was the rationale of overwriting __dict__ instead of directly using property on each field? @domanchi

@killuazhu
Copy link
Contributor Author

@domanchi all comments addressed.

Copy link
Contributor

@domanchi domanchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask what was the rationale of overwriting __dict__ instead of directly using property on each field?

Seeing that this package is designed with baseline readability in mind, I wanted an easy way for each plugin to output information that:

  1. Was easy for humans to look at, and reason about
  2. Was sufficient information to initialize the plugin from

I think the baseline output seems to achieve those goals.

If I'm interpreting your question correctly, I think for the most part, you can use the property on each field, with the exception that different plugins may have different properties. And I certainly don't have any qualms against something like:

class BasePlugin:
    self.name = self.__class__.__name__

except that I didn't need it at the time :)

Copy link
Collaborator

@KevinHock KevinHock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome 👍

@KevinHock KevinHock merged commit 462ade6 into Yelp:master Feb 6, 2019
@killuazhu killuazhu deleted the contribute-respect-option branch January 9, 2020 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants